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Abstract 

Automated tracking of animal movement allows analyses that 
would not otherwise be possible by providing great quantities 
of data. The additional capability of tracking in realtime — 
with minimal latency — opens up the experimental possibility 
of manipulating sensory feedback, thus allowing detailed ex- 
plorations of the neural basis for control of behavior. Here we 
describe a new system capable of tracking the position and 
body orientation of animals such as flies and birds. The sys- 
item operates with less than 40 msec latency and can track 
multiple animals simultaneously. To achieve these results, a 
multi target tracking algorithm was developed based on the 
Extended Kalman Filter and the Nearest Neighbor Standard 
Filter data association algorithm. In one implementation, an 
'eleven camera system is capable of tracking three flies simul- 
taneously at 60 frames per second using a gigabit network of 
nine standard Intel Pentium 4 and Core 2 Duo computers. 
This manuscript presents the rationale and details of the al- 
gorithms employed and shows three implementations of the 
system. An experiment was performed using the tracking sys- 
tem to measure the effect of visual contrast on the flight speed 
of Drosophila melanogaster. At low contrasts, speed is more 
variable and faster on average than at high contrasts. Thus, 
the system is already a useful tool to study the neurobiology 
and behavior of freely flying animals. If combined with other 
techniques, such as 'virtual reality'-type computer graphics 
or genetic manipulation, the tracking system would offer a 
powerful new way to investigate the biology of flying animals. 

Keywords: multitargct tracking; Kalman filter; computer 
vision; data association; Drosophila; insects; birds; humming- 
birds; neurobiology; animal behavior 

1 Introduction 

Much of what wc know about the visiial guidance of flight 
(Schilstra and Hatcren, 1999: Kern ct al., 2005; Srinivasan 
et al., 1996; Tammero and Dickinson, 2002), aerial pursuit 
(Land and CoUett, 1974; Buelthoff ct al., 1980; Wehrhahn 
ct al., 1982; Wagner, 1986; Mizutani ct al., 2003), olfactory 
search algorithms (Frye et al., 2003; Budick and Dickinson, 
2006), and control of aerodynamic force generation (David, 
1978; Fry et al., 2003) is based on experiments in which an 
insect was tracked during flight. To facilitate these types of 
studies and to enable new ones, we created a new, automated 



animal tracking system. A signiflcant motivation was to cre- 
ate a system capable of robustly gathering large quantities 
of accurate data in a highly automated fashion in a flexi- 
ble way. The realtime nature of the system enables experi- 
ments in which an animal's own movement is used to control 
the physical environment, allowing virtual-reality or other dy- 
namic stimulus regimes to investigate the feedback-based con- 
trol performed by the nervous system. Furthermore, the abil- 
ity to easily collect flight trajectories facilitate data analysis 
and behavioral modeling using machine learning approaches 
that require large amounts of data. 

Our primary innovation is the use of arbitrary numbers of 
inexpensive cameras for markerless, realtime tracking of mul- 
tiple targets. Typically, cameras with relatively high temporal 
resolution, such as 100 frames per second, and which are suit- 
able for realtime image analysis (those that do not buffer their 
images to on-camera memory), have relatively low spatial res- 
olution. To have high spatial resolution over a large tracking 
volume, many cameras are required. Therefore, the use of 
multiple cameras enables tracking over large, behaviorally- 
and ecologically- relevant spatial scales with high spatial and 
temporal resolution while minimizing the effects of occlusion. 
The use of miiltiple cameras also gives the system its name, 
flydra, from 'fly', our primary experimental animal, and the 
mythical Greek multi- headed serpent 'hydra'. 

Flydra is largely composed of standard algorithms, hard- 
ware, and software. Our effort has been to integrate these 
disparate pieces of technology into one coherent, working sys- 
tem with the important property that the multi-target track- 
ing algorithm operates with low latency during experiments. 

1.1 System overview 

A Bayesian framework provides a natural formalism to de- 
scribe our multi-target tracking approach. In such a frame- 
work, previously held beliefs are called the a priori, or prior, 
probability distribution of the state of the system. Incoming 
observations are used to update the estimate of the state into 
the a posteriori, or posterior, probability distribution. This 
process is often likened to human reasoning, whereby a per- 
son's best guess at some value is arrived at through a process 
of combining previous expectations of that value with new 
observations that inform about the value. 

The task of flydra is to flnd the maximum a posteriori 
(MAP) estimate of the state St of all targets at time t given 
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observations Zi-t from all time steps (starting with the first 
time step to the current time step), p{Si\Zi-j). Here, St 
represents the state (position and velocity) of all targets, 
St = (S(, . . . ,Sj*) where It is the number of targets at time 
t. Under the first-order Markov assumption, we can factorize 
the posterior as 

piSt\Z,.,t) ^ piZt\St) J piSt\St-i)p{St-i\Zut-i)dSt-i. (1) 

Thus, the process of estimating the posterior probability of 
target state at time t is a recursive process in which new 
observations are used in the model of observation likelihood 
p{Zi\Si). Past observations become incorporated into the 
prior, which combines the motion model p{St\St-i) with the 
target probability from the previous time step p{St-i\Zi.t_i). 

Flydra utilizes an Extended Kalman Filter to approximate 
the solution to equation 1, as described in Section 3.1. The ob- 
servation Zt for each time step is the set of all individual low- 
dimensional feature vectors containing image position infor- 
mation arising from the camera views of the targets (Section 
2). In fact, equation 1 neglects the challenges of data associa- 
tion (linking individual observations with specific targets) and 
targets entering and leaving the tracking volume. Therefore 
the Nearest Neighbor Standard Filter data association step is 
used to link individual observations with target models in the 
model of observation likelihood (Section 3.2), and the state 
update model incorporates the ability for targets to enter and 
leave the tracking vohmic (Section 3.2.3). The heuristics cm- 
ployed to implement the system typically were optimizations 
with regard to realtime performance and low latency rather 
than a compact form, and our system only approximates the 
full Bayesian solution rather than perfectly implements it. 
Nevertheless, the remaining sections of this manuscript ad- 
dress their relation to the global Bayesian framework where 
possible. Aspects of the system which were found to be im- 
portant for low-latency operation are mentioned. 

The general form of the apparatus is illustrated in Fig- 
ure lA and a flowchart of operations is given in Figure 2A. 
Digital cameras are connected (with an IEEE 1394 Fire Wire 
bus or a dedicated gigabit ethcrnet cable) to image process- 
ing computers that perform a background subtraction based 
algorithm to extract image features such as the 2D target po- 
sition and orientation in a given camera's image. From these 
computers, this 2D information is transmitted over a gigabit 
ethernet LAN to a central computer, which performs 2D-to- 
3D triangulation and tracking. Although the tracking results 
are generated and saved online, in realtime as the experiment 
is performed, raw image sequences can also be saved for both 
verification purposes as well as other types of analyses. Fi- 
nally, reconstructed flight trajectories, such as that of Figure 
IB, may then be subjected to further analysis, as in Figures 
3 and 9. 

(Figure 1 near here.) 
(Figure 2 near here.) 
(Figure 3 near here.) 



1.2 Related Work 

Several systems have allowed manual or manually-assisted 
digitization of the trajectories of freely flying animals. In the 
1970s, Land and Collett performed pioneering studies on the 
visual guidance of flight in blowflies (1974) and later, in hov- 
erflies (Collett and Land, 1975, 1978). By the end of the 70s 
and into the 80s, 3D reconstructions using two views of fly- 
ing insects were performed (Wehrhahn, 1979; Buelthoff et al., 
1980; Wehrhahn et al., 1982; Dahmen and Zeil, 1984; Wagner, 
1986). In one case, the shadow of a bee on a planar white sur- 
face was used as a second view to perform 3D reconstruction 
(Srinivasan et al., 2000). Today, hand digitization is still used 
when complex kinematics, such as wing shape and position, 
are desired, such as in Drosophila (Fry et al., 2003), cockatoo 
(Hedrick and Biewener, 2007) and bats (Tian et al., 2006). 

Several authors have solved similar automated multi-target 
tracking problems using video. For example. Khan et al. 
(2005) tracked multiple, interacting ants in 2D from a sin- 
gle view using particle filtering with a Markov Chain Monte 
Carlo sampling step to solve the multi-target tracking prob- 
lem. Later work by the same authors (Khan et al., 2006) 
achieved realtime speeds through the use of sparse updat- 
ing techniques. Branson et al. (2009) addressed the same 
problem for walking flies. Their technique uses background 
subtraction and clustering to detect flies in the image, and 
casts the data association problem as an instance of min- 
imum weight bipartite perfect matching. In implementing 
flydra, we found the simpler system described here to be suf- 
ficient for tracking position of flying flies and hummingbirds 
(see Section 5). In addition to tracking in three dimensions 
rather than two, a key difference between the work described 
about and those addressed in the present work is that the in- 
teractions between our animals are relatively weak (see Sec- 
tion 3.2, especially equation 7), and we did not find it nec- 
essary to implement a more advanced tracker. Nevertheless, 
the present work could be used as the basis for a more ad- 
vanced tracker, such as one using a particle filter (e.g. Klein, 
2008). In that case, the posterior from the Extended Kalman 
Filter (see Section 3.1) could be used as the proposal distri- 
bution for the particle filter. Others have; de(;cntralized the 
multiple object tracking problem to improve performance, es- 
pecially when dealing with dynamic occlusions due to targets 
occluding each other (e.g. Qu et al., 2007a, b). Additionally, 
tracking of dense clouds of starlings (Ballerini et al., 2008; 
Cavagna et al., 2008c,b,a) and fruit flies (Wu et al., 2009; Zou 
et al., 2009) has enabled detailed investigation of swarms, al- 
though these systems are currently incapable of operating in 
realtime. By filming inside a corner-cube reflector, multiple 
(real and reflected) images allowed Bomphrcy et al. (2009) to 
track flies in 3D with only a single camera, and the tracking 
algorithm presented here could make use of this insight. 

Completely automated 3D animal tracking systems have 
more recently been created such as systems with two cameras 
that track flies in realtime (Marden et al., 1997; Fry et al., 
2004, 2008). The system of Grover et al. (2008), similar in 
many respects to the one we describe here, tracks the visual 
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hull of flies using three cameras to reconstruct a polygonal 
model of the 3D shape of the flics. Our system, briefly de- 
scribed in a simpler, earlier form in Maimon et al. (2008), dif- 
fers in several ways. First, flydra has a design goal of tracking 
over large volumes, and, as a result of the associated limited 
spatial resolution (rather than due to a lack of interest), fly- 
dra is concerned only with the location and orientation of 
the animal. Second, to facilitate tracking over large volumes, 
the flydra system uses a data association step as part of the 
tracking algorithm. The data association step allows flydra to 
deal with additional noise (false positive feature detections) 
when dealing with low contrast situations often present when 
attempting to track in large volumes. Third, our system uti- 
lizes a general multi view geometry, whereas the system of 
Grover et al. (2008) utilizes epipolar geometry, limiting its 
use to three cameras. Finally, although their system oper- 
ates in realtime, no measurements of latency were provided 
by Grover et al. (2008) with which to compare our measure- 
ments. 



1.3 Notation 

In the equations to follow, letters in a bold, roman font signify 

a vector, which may be specified by components enclosed in 
parentheses and separated by commas. Matrices are written 
in roman font with uppercase letters. Scalars are in italics. 
Vectors always act like a single column matrix, such that for 
vector V = (a, b, c), the multiplication with matrix M is Mv = 
a 



M [v] = M 



2 2D feature extraction 

The first stage of processing converts digital images into a list 
of feature points using an elaboration of a background sub- 
traction algorithm. Because the image of a target is usually 
only a few pixels in area, an individual feature point from a 
given camera characterizes that camera's view of the target. 
In other words, neglecting missed detections or false positives, 
there is usually a one-to-one correspondence between targets 
and extracted feature points from a given camera. Never- 
theless, our system is capable of successful tracking despite 
missing observations due to occlusion or low contrast (see 
Section 3.1) and rejecting false positive feature detections (as 
described in Section 3.2). 

In the Bayesian framework, all feature points for time t are 
the observation Zt- The ith of n cameras returns m feature 
points, with each point Zjj being a vector Zjj = (u, v, a, /3, 9, e) 
where u and v are the coordinates of the point in the image 
plane and the remaining components are local image statistics 
described below. Zt thus consists of all such feature points for 
a given frame Zt = {zn, . . . ,Zi„, . . . , z„i, . . . ,z„m}. (In the 
interest of simplified notation, our indexing scheme is slightly 
misleading here — there may be varying numbers of features 
for each camera rather than always m as suggested.) 



The process to convert a new image to a series of feature 
points uses a process based on background subtraction using 
the running Gaussian average method (reviewed in Piccardi, 
2004). To achieve fast image processing required for realtime 
operation, many of these operations are performed utilizing 
the high-performance single instruction multiple data (SIMD) 
extensions available on recent x86 CPUs. Initially, an abso- 
lute difference image is made, where each pixel is the absolute 
value of the difference between the incoming frame and the 
background image. Feature points that exceed some thresh- 
old difference from the background image are noted and a 
small region around each pixel is subjected to further anal- 
ysis. For the jth feature, the brightest point has value (3j 
in this absolute difference image. All pixels below a certain 
fraction (e.g. 0.3) of (3j are set to zero to reduce moment 
arms caused by spurious pixels. Feature area aj is found 
from the 0th moment, the feature center (?ij, Vj) is calculated 
from the 1st moment and the feature orientation 9j and ec- 
centricity Cj are calculated from higher moments. After cor- 
recting for lens distortion (see Section 4), the feature center 
is {uj,Vj). Thus, the jth point is characterized by the vec- 
tor Zj = (uj,Vj,aj, Pj,6j,ej). Such features are extracted 
on every frame from every camera, although the number of 
points m found on each frame may vary. We set the initial 
thresholds for detection low to minimize the number of missed 
detections- false positives at this stage are rejected later by 
the data association algorithm (Section 3.2). 

Our system is capable of dealing with illumination condi- 
tions that vary slowly over time by using an ongoing esti- 
mate of the background luminance and its variance, which 
are maintained on a per-pixel basis by updating the current 
estimates with data from every 500th frame (or other arbi- 
trary interval). A more sophisticated 2D feature extraction 
algorithm could be used, but we have found this scheme to be 
sufficient for our purposes and sufficiently simple to operate 
with minimal latency. 

While the realtime operation of flydra is essential for ex- 
periments modifying sensory feedback, another advantage of 
an online tracking system is that the amount of data required 
to be saved for later analysis is greatly reduced. By perform- 
ing only 2D feature extraction in realtime, to reconstruct 3D 
trajectories later, only the vectors Zj need be saved, resulting 
in orders of magnitude less data than the full-frame camera 
images. Thus, to achieve solely the low data rates of real- 
time tracking, the following sections dealing with 3D are not 
necessary to be implemented for this benefit of realtime use. 
Furthermore, raw images taken from the neighborhood of the 
feature points could also be extracted and saved for later anal- 
ysis, saving slightly more data, but still at rates substantially 
less than the full camera frames provide. This fact is partic- 
ularly useful for cameras with a higher data rate than hard 
drives can save, and such a feature is implemented in flydra. 

(Figure 4 near here.) 

Figure 4 shows the parameters (u, v, 9) from the 2D feature 
extraction algorithm during a hummingbird flight. These 2D 
features, in addition to 3D reconstructions, are overlaid on 
raw images extracted and saved using the realtime image ex- 
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traction technique described above. 

3 Multi target tracking 

The goal of flydra, as described in Section 1.1, is to find the 
MAP estimate of the state of all targets. For simplicity, we 
model interaction between targets in a very limited way. Al- 
though in many cases the animals we are interested in track- 
ing do interact (for example, hummingbirds engage in com- 
petition in which they threaten or even contact each other), 
mathematically limiting the interaction facilitates a reduction 
in computational complexity. First, the process update is in- 
dependent for each A;th animal 

p{St\St-i) ^llp{St,k\St-i,k)- (2) 

k 

Second, we implemented only a slight coupling between tar- 
gets in the data association algorithm. Thus, the observation 
likelihood model p{Zt\St) is independent for each target with 
the exception described in Section 3.2.1, and making this as- 
sumption allows use of the Nearest Neighbor Standard Filter 
as described below. 

Modeling individual target states as mostly independent al- 
lows the problem of estimating the MAP of joint target state 
S to be treated nearly as I independent, smaller problems. 
One benefit of making the assumption of target independence 
is that the target tracking and data association parts of our 
system are parallelizable. Although not yet implemented in 
parallel, our system is theoretically capable of tracking very 
many (tens or hundreds) targets simultaneously with low la- 
tency on a computer with sufficiently many processing units. 

The cost of this near-indepdence assumption is reduced 
tracking accuracy during periods of near contact (see Sec- 
ton 3.2.1). Data from these periods could be analyzed later 
using more sophisticated multi-target tracking data associa- 
tion techniques, presumably in an offline setting, especially 
because such periods could be easily identified using a sim- 
ple algorithm. All data presented in this paper utilized the 
procedure described here. 

3.1 Kalman filtering 

The standard Extended Kalman Filter (EKF) approximately 
estimates statistics of the posterior distribution (equation 1) 
for non-linear processes with additive Gaussian noise (details 
are given in Appendix A). To utilize this framework, we make 
the assumption that noise in the relevant processes is Gaus- 
sian. Additionally, our target independence assumption al- 
lows a single Kalman filter implementation to be used for each 
tracked target. The EKF estimates state and its covariance 
based on a prior state estimate and incoming observations 
by using models of the state update process, the observation 
process, and estimates of the noise of each process. 

We use a linear model for the dynamics of the system and 
a nonlinear model of the observation process. Specifically, 



the time evolution of the system is modeled with the linear 
discrete stochastic model 

St = Ast_i -I- w. (3) 

We treat the target as an orientation-free particle, with state 
vector s = (a;, y, z, x, y, z) describing position and velocity in 
3D space. The process update model A represents, in our 
case, the laws of motion for a constant velocity particle 
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with dt being the time step. Maneuvering of the target (devi- 
ation from the constant velocity) is modeled as noise in this 
formulation. The random variable w represents this process 
update noise with a normal probability distribution with zero 
mean and the process covariance matrix Q. Despite the use 
of a constant velocity model, the more complex trajectory of 
a fly or other target is accurately estimated by updating the 
state estimate with frequent observations. 

(Figure 5 near here.) 

For a given data association hypothesis (see Section 3.2), a 
set of observations is available for target k. A nonlinear ob- 
servation model, requiring use of an extended Kalman filter, 
is used to describe the action of a projective camera (equa- 
tions 4-5 below). This allows observation error to be modeled 
as Gaussian noise on the image plane. Furthermore, during 
tracking, triangulation happens only implicitly, and error es- 
timates of target position are larger along the direction of 
the ray between the target and the camera center. (To be 
clear, explicit triangulation is performed during the initial- 
ization of a Kalman model target, as explained in Section 
3.2.) Thus, observations from alternating single cameras on 
successive frames would be sufiicient to produce a 3D esti- 
mate of target position. For example. Figure 5 shows a re- 
constructed fly trajectory in which two frames were lacking 
data from all but one camera. During these two frames, the 
estimated error increased, particularly along the camera-fly 
axis, and no triangulation was possible. Nevertheless, an es- 
timate of 3D position was made and appears reasonable. The 
observation vector y = (ui, wi, U2, W2, ••■,■"„,«„) is the vector 
of the distortion-corrected image points from n cameras. The 
non-linear observation model relates to the state St by 

y, = /i(st)+v (4) 

where St is the state at time t, the function h models the 
action of the cameras and v is observation noise. The vector- 
valued h{s) is the concatenation of the image points found by 
applying the image point equations (equations B-2 and B-3) 
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to each of the n cameras 

h{s) = (/ii(s),...,/i„(s)) 

= {ui,Vi,...,Un,Vn) 

= (?^(PiX),...,H(P„X)). 

The overbar Q denotes a noise-free prediction to which the 
zero-mean noise vector v is added, and X is the homogeneous 
form of the first three components of s. The random variable 
V models the observation noise as normal in the image plane 
with zero mean and covariance matrix R. 

At each time step t, the extended Kalman filter formulation 
is then used to estimate the state s in addition to the error 
P (see Appendix A). Together, the data associated with each 
target is F = {s,P}. With the possibility of multiple targets 
being tracked simultaneously, the fcth target is assigned T''. 

One issue we faced when implementing the Kalman filter 
was parameter selection. Our choice of parameters was done 
through educated guesses followed by an iterated trial-and- 
error procedure using several different trajectories' observa- 
tions. The parameters that resulted in trajectories closest to 
those seen by eye and with least magnitude error estimate 
P were used. We obtained good results, for fruit flies mea- 
sured under the conditions of our setups, with the process 
covariance matrix Q being diagonal, with the first three en- 
tries being 100 mm^ and the next three being 0.25 (m/sec)^. 
Therefore, our model treats maneuvering as position and ve- 
locity noise. For the observation covariance matrix R, we 
found good results with a diagonal matrix with entries of 1, 
corresponding to variance of the observed image positions of 
one pixel. Parameter selection could be automated by an 
expectation-maximization algorithm, but we found this was 
not necessary. 

Another issue is missing data — in some time steps, all views 
of the fly may be occluded or low contrast, leaving a missing 
value of y for that time step. In those cases, we simply set the 
a posteriori estimate to the a priori prediction, as follows from 
equation 1. In these circumstances, the error estimate P grows 
by the process covariance Q, and does not get reduced by 
(nonexistant) new observations. This follows directly from the 
Kalman filter equations (Appendix A). If too many successive 
frames with no observations occur, the error estimate will 
exceed a threshold and tracking will be terminated for that 
target (described in Section 3.2.3). 

3.2 Data association 

One simplification made in the system overview (Section 1.1) 
was to neglect the data association problem — the assign- 
ment of observations to targets. We address the problem by 
marginalizing the observation likelihood across hidden data 
association variables V, where each V corresponds to a differ- 
ent hypothesis about how the feature points correspond with 
the targets. Thus, the model of observation likelihood from 
equation 1 becomes 

?>(Zt|5t)=^p(Zt,I?|5t). (6) 



In fact, computing probabilities across all possible data as- 
sociation hypothc;sc!s V across multiple time steps would result 
in a combinatorial explosion of possibilities. Among the vari- 
ous means of limiting the amount of computation required by 
limiting the number of hypotheses considered, we have cho- 
sen a simple method, the Nearest Neighbor Standard Filter 
(NNSF) data association algorithm run on each target inde- 
pendently (Bar-Shalom and Fortmann, 1988). This algorithm 
is sufficiently efficient to operate in realtime for typical con- 
ditions of our system. Thus, we approximate the sum of all 
data association hypotheses with the single best hypothesis 
I'nnsf, defined to be the NNSF output for each of the k in- 
dependent targets 

p{Zt,V^^SF\St)^Y^p{Zt,V\St). (7) 

This implies that we assume hypotheses other than I?nnsf 
have vanishingly small probability. Errors due to this as- 
sumption being false could be corrected in a later, offline pass 
through the data keeping track of more data association hy- 
potheses using other algorithms. 

2? is a matrix with each column being the data association 
vector for target k such that V = [d^ ... d'' ... ] . This 
matrix has n rows (the number of cameras) and I columns (the 
number of active targets) . The data association vector d'^ for 
target k has n elements of value null or index j of the feature 
Zj assigned to that target. As described below (Section 3.2.4), 
these values are computed from the predicted location of the 
target and the features returned from the cameras. 

3.2.1 Preventing track merging 

One well known potential problem with multi target tracking 
is the undesircd merging of target trajectories if targets begin 
to share the same observations. Before implementing the fol- 
lowing rule, flydra would sometimes suffer from this merging 
problem when tracking hummingbirds cnigaged in territorial 
competition. In such fights, male hummingbirds often fly di- 
rectly at each other and come in physical contact. To prevent 
the two trajectories from merging in such cases, a single pass 
is made through the data association assignments after each 
frame. In the case that more than one target was assigned the 
exact same subset of feature points, a comparison is made be- 
tween the observation and the predicted observation. In this 
case, only the target corresponding to the closest prediction 
is assigned the data, and the other target is updated without 
any observation. We found this procedure to require mini- 
mal additional computational cost, while still being effective 
in preventing trajectory merging. 

3.2.2 NNSF and generative model of image features 

To implement the NNSF algorithm, we implement a genera- 
tive model of feature appearance based on the prior estimate 
of target state. By predicting target position in an incoming 
frame based on prior information, the system selects 2D image 
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points as being likely to come from the target by gating un- 
likely observations, thus limiting the amount of computation 

performed. 

(Figure 6 near here.) 

Recall from Section 2 that for each time t and camera i, 
m feature points are found with the jth point being Zj = 
(uj , Vj , aj , Pj , Oj , ) • The distortion-corrected image coordi- 
nates are (it, v), while a is the area of the object on the image 
plane measured by thresholding of the difference image be- 
tween the current and background image, /3 is an estimate 
of the maximum difference within the difference image, and 
9 and e are the slope and eccentricity of the image feature. 
Each camera may return multiple candidate points per time 
step, with all points from the -ith camera represented as Z^, 
a matrix whose columns are the individual vectors Zj, such 
that Zi — [zi ... Zjn] ■ The purpose of the data association 
algorithm is to assign each incoming point z to an existing 
Kalman model F, to initialize a new Kalman model, or at- 
tribute it to a false positive (a null target). Furthermore, old 
Kalman models for which no recent observations exist due 
to the target leaving the tracking volume must be deleted. 
The use of such a data association algorithm allows flydra to 
track multiple targets simultaneously, as in Figure 6, and, by 
reducing computational costs, allows the 2D feature extrac- 
tion algorithm to return many points to minimize the number 
of missed detections. 



3.2.3 Entry and exit of targets 

How does our system deal with existing targets losing visibil- 
ity due to leaving the tracking volume, occlusion or lowered 
visual contrast? What happens when new targets become vis- 
ible? We treat such occurrences as part of the update model 
in the Bayesian framework of Section 1.1. Thus, in the ter- 
minology from that section, our motion model for all targets 
p{St\St-i) includes the possibility of initializing a new target 
or removing an existing target. This section describes the 
procedure followed. 

For all data points z that remained 'unclaimed' by the pre- 
dicted locations of pre-existing targets (see Section 3.2.4 be- 
low), we use an unguided hypothesis testing algorithm. This 
triangulates a hypothesized 3D point for every possible com- 
bination of 2,3,. ..,n cameras, for a total of (2) + (3) + ■•■ 
-|- (") combinations. Any 3D point with reprojection error 
less than an arbitrary threshold using the greatest mimber 
of cameras is then used to initialize a new Kalman filter in- 
stance. The initial state estimate is set to that 3D position 
with zero velocity and a relatively high error estimate. Track- 
ing is stopped (a target is removed) once the estimated error 
P exceeds a threshold. This most commonly happens, for 
example, when the target leaves the tracking area and thus 
receives no observations for a given number of frames. 

3.2.4 Using incoming data 

Ultimately, the purpose of the data association step is to de- 
termine which image feature points will be used by which 



target. Given the target independence assumption, each tar- 
get uses incoming data independently. This section outlines 
how the data association algorithm is used to determine the 
feature points treated as the observation for a given target. 

To utilize the Kalman filter described in Section 3.1, obser- 
vation vectors must be formed from the incoming data. False 
positives must be separated from correct detections, and, be- 
cause multiple targets may be tracked simultaneously, cor- 
rect detections must be associated with a particular Kalman 
model. At the beginning of processing for each time step t 
for the fcth Kalman model, a prior estimate of target position 
and error rj|j_^ = {s^\^_l,P^^_.^^} is available. It must be 
determined which, if any, of the m image points from the ith 
camera is associated with the fcth target. Due to the realtime 
requirements for our system, flydra gates incoming detections 
on simple criteria before performing more computationally 
intensive tasks. 

For target k at time t, the data association function g is 



4 = fl(Zi, 



7 V 

t ^ny '- t\t- 



(8) 



This is a function of the image points Zj from each of the n 
cameras and the prior information for target fc. The assign- 
ment vector for the fcth target, d'', defines which points from 
which cameras view a target. This vector has a component 
for each of the n cameras, which is either null (if that camera 
doesn't contribute) or is the column index of Zj corresponding 
to the associated point. Thus, d'^ has length n, the number 
of cameras, and no camera may view the same targcit more 
than once. Note, the fc and t superscript and subscript on d 
indicate the assignment vector is for target k at time step t, 
whereas below (Equation 9) , the subscript i is used to indicate 
the ith component of the vector d. 

The data association function g may be written in terms of 
the components of d. The ith component is the index of the 
columns of Zj that maximizes likelihood of the observation 
given the predicted target state and error and is defined to be 



di = argmax(p(zj|r)), zj e Zj 

j 



(9) 



Our likelihood function gates detections based on two condi- 
tions. First, the incoming detected location (uj.Vj) must be 
within a threshold Euclidean distance from the estimated tar- 
get location projected on the image. The Euclidean distance 
on the image plane is 



dist2d = d, 



euclid 



,^(PiX)) 



(10) 



where 7/(PjX) finds the projected image coordinates of X, 
where X is the homogeneous form of the first three compo- 
nents of s, the expected 3D position of the target. The func- 
tion H and camera matrix P, are described in Appendix B. 
The gating can be expressed as an indicator function 



Ldist2d 



iuj,Vj) 



1 if dist2d < threshdist2d, 
otherwise. 



(11) 
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Second, the area of the detected object {aj) must be greater 
than a threshold value, expressed as 

1 ^ _ / I if aj > thresharea, 

lareal^iJ - |q otherwise. ^ ' 

If these conditions arc met, the distance of the ray connecting 
the camera center and 2D point on the image plane {uj,Vj) 
from the expected 3D location a is used to further determine 
likelihood. Wc use the Mahalanobis distance, which for a 
vector a with an expected value of a with covariance matrix 
E is 

dmahai{a,a.) = -^(a-a)TS-i(a-a). (13) 

Because the distance function is convex for a given a and E, 
we can solve directly for the closest point on the ray, by setting 
a equal to the first three terms of St\t-i and E to the upper left 
3x3 submatrix of Pt|i_i. Then, if the ray is a parametrized 
line of the form £{s) = s ■ {a, b, c) + {x, y, z) where (a, 5, c) is 
the direction of the ray formed by the image point (u, v) and 
the camera center and {x, y, z) is a point on the ray, we can 
find the value of s for which the distance between £(s) and 
a is minimized by finding the value of s where derivative of 
(£(s), a) is zero. If we call this closest point a and 
combine equations 11-13, then our likelihood function is 

p(z,|r) = ldist2d(w,-,t^j) larea(ai) e-''-'-'^^'^) . (14) 

Note that, due to the multiplication, if either of the first two 
factors are zero, the third (and more computationally expen- 
sive) condition need not be evaluated. 

4 Camera and lens calibration 

Camera calibrations may be obtained in any way that pro- 
duces camera calibration matrices (described in Appendix B) 
and, optionally, parameters for a model of the non-linear dis- 
tortions of cameras. Good calibrations are critical for fiydra 
because, as target visibility changes from one subset of cam- 
eras to another subset, any misalignment of the calibrations 
will introduce artifactual movement in the reconstructed tra- 
jectories. Typically, we obtain camera calibrations in a two 
step process. First, the Direct Linear Transformation (DLT) 
algorithm (Abdel-Aziz and Karara, 1971) directly estimates 
camera calibration matrices that could be used for triangu- 
lation as described in Appendix B. However, because we use 
only about 10 manually digitized corresponding 2D/3D pairs 
per camera, this calibration is relatively low-precision and, as 
performed, ignores optical distortions causing deviations from 
the linear simple pinhole camera model. Therefore, an auto- 
mated Multi-Camera Self Calibration Toolbox (Svoboda ct al., 
2005) is used as a second step. This toolbox utilizes inher- 
ent numerical redundancy when multiple cameras are view- 
ing a common set of 3D points through use of a factorization 
algorithm (Sturm and Triggs, 1996) followed by bundle ad- 
justment (reviewed in Section 18.1 of Hartley and Zisserman, 
2003). By moving a small LED point light source through 
the tracking volume (or, indeed, a freely flying fly), hundreds 



of corresponding 2D/3D points are generated which lead to 
a single, ovcrdetcrmined solution which, without knowing the 
3D positions of the LED, is accurate up to a scale, rotation, 
and translation. The camera centers, either from the DLT 
algorithm or measured directly, arc then used to find the best 
scale, rotation, and translation. As part of the Multi-Camera 
Self Calibration Toolbox, this process may be iterated with 
an additional step to estimate non-linear camera parameters 
such as radial distortion (Svoboda et al., 2005) using the Cam- 
era Calibration Toolbox of Bouguet. Alternatively, we have 
also implemented the method of Prcscott and McLean (1997) 
to estimate radial distortion parameters before use of Svo- 
boda's toolbox, which we found necessary when using wide 
angle lenses with significant radial distortion (e.g. 150 pixels 
in some cases). 

5 Implementation and evaluation 

We built three different flydra systems: a five camera, six 
computer 100 fps system for tracking fruit flies in a 0.3m 
X 0.3m X 1.5m arena (e.g. Figures 1, 9 and Maimon et al. 
2008, which used the same hardware but a simpler version of 
the tracking software), an eleven camera, nine computer 60 
fps system for tracking fruit flies in a large — 2m diameter x 
0.8m high cylinder (e.g. Figure 5) and a four camera, five 
computer 200 fps system for tracking hummingbirds in a 1.5m 
X 1.5m X 3m arena (e.g. Figure 4). Apart from the low-level 
camera drivers, the same software is running on each of these 
systems. 

We used the Python computer language to implement fly- 
dra. Several other pieces of software, most of which arc open 
source, are instrumental to this system: motmot (Straw and 
Dickinson, 2009), pinpoint, pytables, numpy, scipy, Pyro, wx- 
Python, tvtk, VTK, matplotlib, PyOpcnGL, pyglet, Pyrex, 
cython, ctypes, Intel IPP, ATLAS, libdcl394, PTPd, gcc, and 
Ubuntu. We used Intel Pentium 4 and Core 2 Duo based com- 
puters. 

The accuracy of 3D reconstructions was verified in two 
ways. First, the distance between 2D points projected from a 
3D estimate derived from the originally extracted 2D points 
is a measure of calibration accuracy. For all figures shown, 
the mean reprojection error was less than one pixel, and for 
most cameras in most figures, was less than 0.5 pixels. Sec- 
ondly, the 3D coordinates and distances between coordinates 
measured through triangulation were verified against values 
measured physically. For two such measurements in the sys- 
tem shown in Figure 1, these values were within 4 percent. 

(Figure 7 near here.) 

We measured the latency of the 3D reconstruction by syn- 
chronizing several clocks involved in our experimental setup 
and then measuring the duration between onset of image ac- 
quisition and completion of the computation of target posi- 
tion. When flydra is running, the clocks of the various com- 
puters are synchronized to within 1 microsecond by PTPd, the 
precise time protocol daemon, an implementation of the IEEE 
1588 clock synchronization protocol (Correll et al., 2005). Ad- 
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ditionally, a microcontroller (AT90USBKEY, Atmel, USA) 

running custom firmware is connected over USB to the cen- 
tral reconstruction computer and is synchronized using an al- 
gorithm similar to PTPd, allowing the precise time of frame 
trigger events to be known by processes running within the 
computers. Measurements were made of the latency between 
the time of the hardware trigger pulse generated on the micro- 
controller to start acquisition of frame t and the moment the 
state vector s^^t was computed. These measurements were 
made with a central computer being a 3 GHz Intel Core 2 
Duo CPU. As shown in Figure 7, the median 3D reconstruc- 
tion timestamp is 39 msec. Further investigation showed the 
makeup of this delay. Prom the specifications of the cameras 
and bus used, 19.5 msec is a lower bound on the latency of 
transferring the image across the IEEE 1394 bus (and could 
presumably be reduced by using alternative technologies such 
as Gigabit Ethernet or Camera Link). Further measurements 
on this system showed that 2D feature extraction takes 6-12 
msec, and that image acquisition and 2D feature extraction 
together take 26-32 msec. The remainder of the latency to 
3D estimation is due to network transmission, triangulation, 
and tracking, and, most likely, non-optimal queueing and or- 
dering of data while passing it between these stages. Further 
investigation and optimization has not been performed. 

6 Experimental Possibilities 

A few examples serve to illustrate some of the capabilities of 
flydra. We are actively engaged in understanding the sensory- 
motor control of flight in the fruit fly Drosophila melanogaster. 
Many basic features of the effects of visual stimulation on the 
flight of flies are known, and the present system allows us to 
characterize these phenomena in substantial detail. For exam- 
ple, the presence of a vertical landmark such as a black post on 
a white background greatly influences the structure of flight, 
causing flies to 'fixate', or turn towards, the post (Kennedy, 
1939). By studying such behavior in free flight (e.g. Figure 
3), we have found that flies approach the post until some small 
distance is reached, and then often turn rapidly and fly away. 

(Figure 8 near here.) 

Because we have an online estimate of fly velocity and po- 
sition in s, we can predict the future location of the fly. Of 
course, the quality of the prediction declines with the du- 
ration of extrapolation, but it is sufficient for many tasks, 
even with the latency taken by the 3D reconstruction itself. 
One example is the triggering of high resolution, high speed 
cameras (e.g. 1024x1024 pixels, 6000 frames per second as 
shown in Figure 8A). Such cameras typically buffer their im- 
ages to RAM and are downloaded offiine. We can construct 
a trigger condition based on the position of an animal (or a 
recent history of position, allowing triggering only on specific 
maneuvers). Figure 8 A shows a contrast-enhancing false color 
montage of a fly makes a close approach to a post before leav- 
ing. By studying precise movements of the wings and body in 
addition to larger scale movement through the environment, 
we are working to understand the neural and biomechanical 



mechanisms underlying control of flight. 

An additional possibility enabled by low latency 3D state 
estimates is that visual stimuli can be modulated to produce 
a 'virtual reality' environment in which the properties of the 
visual feedback loop can be artificially manipulated (Fry et al., 
2004, 2008; Strauss et al., 1997; Straw, 2008). In these types 
of experiments, it is critical that moving visual stimuli do not 
affect the tracking. For this reason, we illuminate files with 
near-IR light and use high pass filters in front of the camera 
lenses (e.g. R-72, Hoya Corporation). Visual cues provided 
to the; files are in the blue-green range. 

By estimating the orientation of the animal, approximate 
reconstructions of the visual stimulus experienced by the an- 
imal may be made. For example, to make Figure; 8B. a simu- 
lated view through the compound eye of a fly, we assumed 
that the fly head, and thus eyes, was oriented tangent to 
the direction of travel, and that the roll angle was fixed at 
zero. This information, together with a 3D model of the envi- 
ronment, was used to generate the reconstruction (Neumann, 
2002; Dickson et al., 2008). Such reconstructions are infor- 
mative for understanding the nature of the challenge of navi- 
gating visually with limited spatial resolution. Although it is 
known that files do move their head relative to their longitudi- 
nal body axis, these movements are generally small (Schilstra 
and Hateren, 1999) , and thus the errors resulting from the as- 
sumptions listed above could be reduced by estimating body 
orientation using the method described in Appendix B. Be- 
cause a fly's eyes are fixed to its head, further reconstruc- 
tion accuracy could be gained by fixing the head relative to 
the body (by gluing the head to the thorax), although the 
behavioral effects of such a manipulation would need to be 
investigated. 

Finally, because of the configurability of the system, it is 
feasible to consider large scale tracking in naturalistic environ- 
ments that would lead to greater understanding of the natural 
history of flies (Stamps et al., 2005) or other animals. 

7 Effect of contrast on speed regula- 
tion in Drosophila 

At low levels of luminance contrast, when differences in the 
luminance of different parts of a scene are minimal, it is diffi- 
cult or impossible to detect motion of visual features. In flies, 
which judge self-motion (in part) using visual motion detec- 
tion, the exact nature of contrast sensitivity has been used as 
a tool to investigate the fundamental mechanism of motion 
detection using electrophysiological recordings. At low con- 
trast levels, these studies have found a quadratic relationship 
between membrane potential and luminance contrast (Dvo- 
rak et al., 1980; Egelhaaf and Borst, 1989; Srinivasan and 
Dvorak, 1980). This result is consistent with Hassenstein- 
Reichardt correlator model for elementary motion detection 
(the HR-EMD, Hassenstein and Reichardt, 1956), and these 
findings are part of the evidence to support the hypothesis 
that the fly visual system implements something very similar 
to this mathematical model. 
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Despite these and other electrophysiological findings sug- 
gesting the HR-EMD may underlie fly motion sensitivity, 
studies of flight speed regulation and other visual behaviors 
in freely flying flies (David, 1982) and honey bees (Srinivasan 
et al., 1991; Si ct al., 2003; Srinivasan ct al., 1993) show that 
the free-flight behavior of these insects is inconsistent with a 
flight velocity regulator based on a simple HR-EMD model. 
More recently, Baird et al. (2005) have shown that over a 
large range of contrasts, flight velocity in honey bees is nearly 
unaffected by contrast. As noted by those authors, however, 
their setup was unable to achieve true zero contrast due to im- 
perfections with their apparatus. They suggest that contrast 
adaptation (Harris et al., 2000) may have been responsible 
for boosting the responses to low contrasts and attenuating 
responses to high contrast. This possibility was supported 
by the finding that forward velocity was better regulated at 
the nominal "zero contrast" condition than in the presence of 
an axial stripe, which may have had the effect of preventing 
contrast adaptation while provide no contrast perpendicular 
to the direction of flight (Baird et al., 2005). 

To test the efl^ect of contrast on the regulation of flight speed 
in Drosophila melanogaster, we performed an experiment, the 
results of which are shown in Figure 9. In this setup, we found 
that when contrast was sufficiently high (Michelson contrast 
> 1.6), flies regulated their speed to a mean speed of 0.15 m/s 
with a standard deviation of 0.07. As contrast was lowered, 
the mean speed increased, as did variability of speed, suggest- 
ing that speed regulation suffered due to a loss of visual feed- 
back. To perform these experiments, a computer projector 
(DepthQ, modified to remove color filter wheel, Lightspeed 
Design, USA) illuminated the long walls and floor of a 0.3m x 
0.3m X 1.5m arena with a regular checkerboard pattern (5cm 
squares) of varying contrast and flxed luminance (2 cd/m^). 
The test contrasts were cycled, with each contrast displayed 
for 5 minutes. 20 females flies were released into the arena 
and tracked over 12 hours. Any flight segments more than 5 
cm from the walls, floor or ceiling were analyzed, although for 
the majority of the time flies were standing or walking on the 
floors or walls of the arena. Horizontal flight speed was mea- 
sured as the first derivative of position in the XY direction, 
and histograms were computed with each frame constituting 
a single sample. Because the identity of the flies could not be 
tracked for the duration of the experiment, the data contain 
pseudo-replication — some flies probably contributed more to 
the overall histogram than others. Nevertheless, the results 
from 3 separate experimental days with 20 new flies tested on 
each day were each qualitatively similar to the pooled results 
shown, which include 1760 total seconds of flight in which 
tracked fly was 5 cm or greater from the nearest arena sur- 
face and were acquired during 30 cumulative hours. 

The primary difference between our findings on the effect of 
contrast on flight speed in Drosophila melanogaster compared 
to that found in honey bees by Baird et al. (2005) is that at 
low contrasts (below 0.16 Michelson contrast), flight speed in 
Drosophila is faster and more variable. This difference could 
be due to several non-mutually exclusive factors: 1) Our arena 
may have fewer imperfections which create visual contrast. 2) 



Fruit flies may have a lower absolute contrast sensitivity than 
honey bees. 3) Fruit flies may have a lower contrast sensitivity 
at the luminance level of the experiments. 4) Fruit flies may 
have less contrast adaptation ability. Or 5) fruit flies may 
employ an alternate motion detection mechanism. 

Despite the difference between the present results in fruit 
flies from those of honey bees at low contrast levels, at high 
contrasts (above 0.16 Michelson contrast for Drosophila) flight 
speed in both species was regulated around a constant value. 
This suggests that the visual system has little trouble esti- 
mating self-motion at these contrast values and that insects 
regulate flight speed about a set point using visual informa- 
tion. 

(Figure 9 near here.) 
8 Conclusions 

The highly automated and real time capabilities of our system 
allow unprecedented experimental opportunities. We are cur- 
rently investigating the object approach and avoidance phe- 
nomenon of fruit flies illustrated in Figure 3. We are also 
studying maneuvering in solitary and competing humming- 
birds and the role of maneuvering in establishing dominance. 
One of the opportunities made possible by the molecular bi- 
ological revolution are powerful new tools that can be used 
to visualize and modify the activity of neurons and neural 
circuits. By precisely quantifying high level behaviors, such 
as the object attraction/repulsion described above, we hope 
to make use of these tools to approach the question of how 
neurons contribute to the process of behavior. 



Appendix A 



Extended Kalman fil- 
ter 



The extended Kalman filter was used as described in Section 
3.1. Here, we give the equations using the notation of this 
paper. Note that because only the observation process is non- 
linear, the process dynamics are specified with (linear) matrix 
A. 

The a priori predictions of these values based on the pre- 
vious frame's posterior estimates are 



and 



St|f-i — As(_i|(_i 



Pt|t_i = APt_i|t_iAT + Q. 



(A-1) 



(A-2) 



To incorporate observations, a gain term K is calculated to 
weight the innovation arising from the difference between the 
a priori state estimate St\t-i and the observation y 



Pt|t-iHj 



HtPt|t-iHj + R 



(A-3) 



The observation matrix, Hj, is defined to be the Jacobian 
of the observation function (equation 5) evaluated at the ex- 
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pected state 

St|t-1 

The posterior estimates are then 

st\t = st\t-i + Kt(yj - BtSt\t-i) (A-5) 

and 

Pi|t = (I-KtHt)Pt|t_i. 

Appendix B Triangulation 

The basic 2D-to-3D calculation finds the best 3D location for 

two or more 2D camera views of a point, and is implemented 
using a linear least-squares fit of the intersection of n rays 
defined by the 2D image points and 3D camera centers of 
each of the n cameras (Hartley and Zisserman, 2003). After 
correction for radial distortion, the image of a 3D point on 
the ith camera is {ui,Vi). For mathematical reasons, it is 
convenient to represent this 2D image point in homogeneous 
coordinates 

= iri,Si,ti), (B-1) 

such that Ui = ri/ti and Vi = Si/U. For convenience, we define 
the function 7i to convert from homogeneous to Cartesian 
coordinates, thus 

H{^) = {u,v) = {r/t,s/t). (B-2) 

The 3x4 camera calibration matrix Pj models the projec- 
tion from a 3D homogeneous point X = (Xi, X2, X^, X4) 
(representing the 3D point with inhomogeneous coordinates 
{x,y,z) = {X1/X4,, X2/X4, X3/X4)) into the image point: 

X, = PiX. (B-3) 

By combining the image point equation B-3 from two or more 
cameras, we can solve for X using the homogeneous linear 
triangulation method based on the singular value decompo- 
sition as described in Hartley and Zisserman (2003, Sections 
12.2 and A5.3). 

A similar approach can be used for reconstructing the ori- 
entation of the longitudinal axis of an insect or bird. Briefly, 
a line is fit to this axis in each 2D image (using u, v and 6 
from Section 2) and, together with the camera center, is used 
to represent a plane in 3D space. The best-fit line of inter- 
section of the n planes is then found with a similar singular 
value decomposition algorithm (Hartley and Zisserman, 2003, 
Section 12.7). 
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Figure 1: A) Schematic diagram of the multi-camera tracking system. B) A trajectory of a fly {Drosophila melanog aster) 
near a dark, vertical post. Arrow indicates direction of flight at onset of tracking. 
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Figure 2: A) Flowchart of operations. B) Schematic illustration of a 2D camera view showing the raw images (brown), 
feature extraction (blue), state estimation (black), and data association (red). C) 3D reconstruction using the Extended 
Kalman filter uses prior state estimates (open circle) and observations (blue lines) to construct a posterior state estimate 
(closed circle) and covariance ellipsoid (dotted ellipse). 
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Figure 3: A) Top view of the fly trajectory in Figure IB, showing several close approaches to and movements away from a 
dark post placed in the center of the arena. The arrow indicates initial flight direction. Two sequences are highlighted in 
color. The inset shows the coordinate system. B) The time-course of attraction and repulsion to the post is characterized 
by flight directly towards the post until only a small distance remains, at which point the fly turns rapidly and flies away. 
Approach angle ip is quantified as the difference between the direction of flight and bearing to the post, both measured in 
the horizontal plane. The arrow indicates the initial approach for the selected sequences. C) Angular velocity (measured 
tangent to the 3D flight trajectory) indicates that relatively straight flight is punctuated by saccades (brief, rapid turns). 
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Figure 4: Raw images, 2D data extracted from the images, and overlaid computed 3D position and body orientation of a 
hummingbird (Calypte anna). In these images, a blue circle is drawn centered on the 2D image coordinates {u,v). The blue 
line segment is drawn through the detected body axis {6 in Section 3.2) when eccentricity (e) of the detected object exceeds 
a threshold. The orange circle is drawn centered on the 3D estimate of position (x, y, z) reprojected through the camera 
calibration matrix P, and the orange line segment is drawn in the direction of the 3D body orientation vector. 
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Figure 5: Two seconds of a reconstructed Drosophila trajectory. A) Top view. B) Side view of same trajectory. Kalman filter 
based estimates of fly position s are plotted as dots at the center of ellipsoids, which are the projections of the multivariate 
normal specifled by the covariance matrix P. Additionally, position estimated directly by triangulation of 2D point locations 
(see Appendix B) is plotted with crosses. Fly began on the right and flew in the direction denoted by the arrow. Note that 
for two frames near the beginning, only a single camera contributed to the tracking and the error estimate increased. 
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Figure 6: Multiple flies tracked simultaneously. Each automatically segmented trajectory is plotted in its own color. Note 
that the dark green and cyan trajectories probably came from the same fly which, for a period near the 10th second, was not 
tracked due to a series of missed detections or leaving the tracking volume (see Section 3.2). A) First horizontal axis (x). B) 
Second horizontal axis (y). C) Vertical axis (z). 
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Figure 7: Latency of one tracking system. A histogram of the latency of 3D reconstruction was generated after tracking 20 
flies for 18 hours. Median latency was 39 msec. 




Figure 8: Example applications for flydra. A) By tracking flies in real-time, we can trigger high speed video cameras (6000 
fps) to record interesting events for later analysis. B) A reconstruction of the visual scene viewed through the left eye of 
Drosophila. Such visual reconstructions can be used to simulate neural activity in the visual system (e.g. Neumann, 2002; 
Dickson et al., 2008). 
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Figure 9: Drosophila melanogaster maintain a lower flight speed with lower variability as visual contrast is increased. Mean 
and standard deviation of flight speeds are shown in text, and each histogram is normalized to have equal area. Ambient 
illumination reflected from one surface after being scattered from other illuminated surfaces slightly reduced contrast from 
the nominal values shown here. 
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